-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liquid Clustering config for table materialization #398
Liquid Clustering config for table materialization #398
Conversation
Signed-off-by: Ammar Chalifah <[email protected]>
Signed-off-by: Ammar Chalifah <[email protected]>
Signed-off-by: Ammar Chalifah <[email protected]>
Signed-off-by: Ammar Chalifah <[email protected]>
Signed-off-by: Ammar Chalifah <[email protected]>
288eaa9
to
d2d43f9
Compare
Signed-off-by: Ammar Chalifah <[email protected]>
Hi, could you help review this PR? @andrefurlan-db @susodapop @ueshin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good with one question 👍 Thanks for this contribution.
@@ -154,6 +176,7 @@ def test_macros_create_table_as_all_delta(self): | |||
"create or replace table my_table " | |||
"using delta " | |||
"partitioned by (partition_1,partition_2) " | |||
"cluster by (cluster_1,cluster_2)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docs (emphasis mine):
Clustering is not compatible with partitioning or ZORDER, and requires that the Azure Databricks client manages all layout and optimization operations for data in your table.
Therefore the output of this self._render_create_as()
, while expected, would technically raise an exception. Should we enforce any kind of block or just allow the compute cluster to raise an exception if the dbt user configures both partition_by
and liquid_clustered_by
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I'm for leaving the compute cluster to raise an exception, but would be open for suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pinging a couple other engineers internally for an opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My stance has been to let Databricks surface the issues, rather than intercepting in the adapter. I'm not sure if we're doing this consistently everywhere. My reasoning is that, for any rule we hard code in, the Databricks implementation could change, and it would be better for that functionality to be available to customers without us having to spin a new release.
@ammarchalifah can you file an issue for us to add this capability for python models as well? The python model support is currently in flux, so no need for you to make the change, but I would appreciate if you capture the expected behavior (matching your sql tests) in an issue. |
Yes, opened a new issue here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! I've updated this to merge to a staging branch on the main repository. This enables the staging branch to run our e2e tests. Once those pass, we'll merge to main
@susodapop Thank you! Really happy to contribute and can't wait to try out Liquid Clustering in our system! |
All tests passed. Should we merge the PR? @susodapop |
Any update with this PR? @susodapop |
154e5c6
into
databricks:staging-liquid-clustering
Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]>
Signed-off-by: Ammar Chalifah <[email protected]> Co-authored-by: Ammar Chalifah <[email protected]> (cherry picked from commit b632484)
Resolves #399
Description
With this change, dbt user could supply
liquid_clustered_by
model config with column (or list of columns) to be used as clustering keys using Liquid Clustering feature of Delta. The change introduces a small change in DDL query, while it does nothing forpython
-based models (as I couldn't find the python API for Liquid Clustering).Checklist
CHANGELOG.md
and added information about my change to the "dbt-databricks next" section.